Goto

Collaborating Authors

 style feature



IF-Font: Ideographic Description Sequence-Following Font Generation

Neural Information Processing Systems

Few-shot font generation (FFG) aims to learn the target style from a limited number of reference glyphs and generate the remaining glyphs in the target font. Previous works focus on disentangling the content and style features of glyphs, combining the content features of the source glyph with the style features of the reference glyph to generate new glyphs. However, the disentanglement is challenging due to the complexity of glyphs, often resulting in glyphs that are influenced by the style of the source glyph and prone to artifacts. We propose IF-Font, a novel paradigm which incorporates Ideographic Description Sequence (IDS) instead of the source glyph to control the semantics of generated glyphs. To achieve this, we quantize the reference glyphs into tokens, and model the token distribution of target glyphs using corresponding IDS and reference tokens. The proposed method excels in synthesizing glyphs with neat and correct strokes, and enables the creation of new glyphs based on provided IDS. Extensive experiments demonstrate that our method greatly outperforms state-of-the-art methods in both one-shot and few-shot settings, particularly when the target styles differ significantly from the training font styles.


Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation

Neural Information Processing Systems

In this paper, we consider the problem of domain generalization in semantic segmentation, which aims to learn a robust model using only labeled synthetic (source) data. The model is expected to perform well on unseen real (target) domains. Our study finds that the image style variation can largely influence the model's performance and the style features can be well represented by the channel-wise mean and standard deviation of images. Inspired by this, we propose a novel adversarial style augmentation (AdvStyle) approach, which can dynamically generate hard stylized images during training and thus can effectively prevent the model from overfitting on the source domain. Specifically, AdvStyle regards the style feature as a learnable parameter and updates it by adversarial training. The learned adversarial style feature is used to construct an adversarial image for robust model training. AdvStyle is easy to implement and can be readily applied to different models. Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains and show that we can achieve the state of the art. Moreover, AdvStyle can be employed to domain generalized image classification and produces a clear improvement on the considered datasets.


XAM: Interactive Explainability for Authorship Attribution Models

Alshomary, Milad, Bhatnagar, Anisha, Zeng, Peter, Muresan, Smaranda, Rambow, Owen, McKeown, Kathleen

arXiv.org Artificial Intelligence

We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users to interactively explore the model's embedding space and construct an explanation of the model's prediction as a set of writing style features at different levels of granularity. Through a user evaluation, we demonstrate the value of our framework compared to predefined stylistic explanations.


The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features

Ferrao, Jeremias, van der Lende, Matthijs, Lichkovski, Ilija, Neo, Clement

arXiv.org Artificial Intelligence

Prevailing alignment methods induce opaque parameter changes, obscuring what models truly learn. To address this, we introduce Feature Steering with Reinforcement Learning (FSRL), a framework that trains a lightweight adapter to steer model behavior by modulating interpretable sparse features. First, we theoretically demonstrate that this mechanism is expressive enough to approximate the behavioral shifts of post-training processes. We then apply FSRL to preference optimization and perform a causal analysis of the learned policy. Our analysis reveals a crucial insight: the model learns to reward stylistic presentation as a proxy for quality, disproportionately relying on features related to style and formatting over those tied to alignment concepts like honesty. By effectively optimizing the preference objective, FSRL serves as a transparent proxy for observing the alignment process. Overall, FSRL offers an interpretable control interface and a practical way to diagnose how preference optimization pressures manifest at the feature level.



AuthSig: Safeguarding Scanned Signatures Against Unauthorized Reuse in Paperless Workflows

Zhang, RuiQiang, Ma, Zehua, Wang, Guanjie, Liu, Chang, Wang, Hengyi, Zhang, Weiming

arXiv.org Artificial Intelligence

With the deepening trend of paperless workflows, signatures as a means of identity authentication are gradually shifting from traditional ink-on-paper to electronic formats.Despite the availability of dynamic pressure-sensitive and PKI-based digital signatures, static scanned signatures remain prevalent in practice due to their convenience. However, these static images, having almost lost their authentication attributes, cannot be reliably verified and are vulnerable to malicious copying and reuse. To address these issues, we propose AuthSig, a novel static electronic signature framework based on generative models and watermark, which binds authentication information to the signature image. Leveraging the human visual system's insensitivity to subtle style variations, AuthSig finely modulates style embeddings during generation to implicitly encode watermark bits-enforcing a One Signature, One Use policy.To overcome the scarcity of handwritten signature data and the limitations of traditional augmentation methods, we introduce a keypoint-driven data augmentation strategy that effectively enhances style diversity to support robust watermark embedding. Experimental results show that AuthSig achieves over 98% extraction accuracy under both digital-domain distortions and signature-specific degradations, and remains effective even in print-scan scenarios.


MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Kim, Hyung Kyu, Lee, Sangmin, Kim, Hak Gu

arXiv.org Artificial Intelligence

Speech-driven 3D facial animation aims to synthesize realistic facial motion sequences from given audio, matching the speaker's speaking style. However, previous works often require priors such as class labels of a speaker or additional 3D facial meshes at inference, which makes them fail to reflect the speaking style and limits their practical use. To address these issues, we propose MemoryTalker which enables realistic and accurate 3D facial motion synthesis by reflecting speaking style only with audio input to maximize usability in applications. Our framework consists of two training stages: 1-stage is storing and retrieving general motion (i.e., Memorizing), and 2-stage is to perform the personalized facial motion synthesis (i.e., Animating) with the motion memory stylized by the audio-driven speaking style feature. In this second stage, our model learns about which facial motion types should be emphasized for a particular piece of audio. As a result, our MemoryTalker can generate a reliable personalized facial animation without additional prior information. With quantitative and qualitative evaluations, as well as user study, we show the effectiveness of our model and its performance enhancement for personalized facial animation over state-of-the-art methods.